Radiomics analysis for prediction and classification of submucosal tumors based on gastrointestinal endoscopic ultrasonography

Abstract Objectives To identify and classify submucosal tumors by building and validating a radiomics model with gastrointestinal endoscopic ultrasonography (EUS) images. Methods A total of 144 patients diagnosed with submucosal tumors through gastrointestinal EUS were collected between January 2019 and October 2020. There are 1952 radiomic features extracted from each patient's EUS images. The statistical test and the customized least absolute shrinkage and selection operator regression were used for feature selection. Subsequently, an extremely randomized trees algorithm was utilized to construct a robust radiomics classification model specifically tailored for gastrointestinal EUS images. The performance of the model was measured by evaluating the area under the receiver operating characteristic curve. Results The radiomics model comprised 30 selected features that showed good discrimination performance in the validation cohorts. During validation, the area under the receiver operating characteristic curve was calculated as 0.9203 and the mean value after 10‐fold cross‐validation was 0.9260, indicating excellent stability and calibration. These results confirm the clinical utility of the model. Conclusions Utilizing the dataset provided curated from gastrointestinal EUS examinations at our collaborating hospital, we have developed a well‐performing radiomics model. It can be used for personalized and non‐invasive prediction of the type of submucosal tumors, providing physicians with aid for early treatment and management of tumor progression.


INTRODUCTION
With the advancement of endoscopic ultrasonography (EUS) technology, the detection rates of submucosal tumors (SMTs) within the gastrointestinal tract have significantly increased. 1These tumors, emerging from the pathological transformation of non-epithelial mesenchymal tissues within the gastric wall, represent a diverse group of neoplastic lesions.They include gastric submucosal leiomyoma (GSL), neuroendocrine tumors (NETs), gastric ectopic pancreas (GEP), gastrointestinal stromal tumors (GIST), gastric mucosa lipoma (GML), and hemangiomas, among others, with GSL and GIST being notably prevalent. 2The detection of SMTs often occurs incidentally during gastroscopy or is prompted by clinical symptoms related to the tumor's size and location, such as palpable abdominal masses, gastrointestinal bleeding, or abdominal pain.The European Society of Gastrointestinal Endoscopy guidelines from 2022 highlight the significance of early diagnosis in enhancing treatment outcomes by outlining indications for SMT treatment based on malignancy risk, symptomatic presentation, and considerations for patients undergoing bariatric surgery. 3,4espite the benefits, traditional endoscopy primarily offers a superficial examination, limited in its capacity to visualize the deeper mucosal layers.Consequently, biopsy samples obtained during such procedures might not yield sufficient pathological information for an accurate determination of the tumor's nature, especially given the challenge of distinguishing between benign and malignant lesions. 5EUS emerges as a superior modality by integrating the benefits of ultrasound and endoscopy, enabling detailed visualization of SMTs' external morphology and probing their origin layers, which is crucial for accurate pathology assessment and prognosis estimation. 6,7Additionally, EUS provides invaluable insights into membrane integrity and other prognostic factors, thereby predicting surgical outcomes and enhancing patient management. 8This study leverages EUS by concurrently acquiring white light images (WLI) and ultrasound images, playing a pivotal role in the precise identification of tumor types.
The digital age has led to an unprecedented gathering of patient and tumor data for research.0][11] It transforms EUS images into detailed data, revealing patterns in tumor traits that traditional imaging might miss. 12,13As a revolutionary technique, radiomics converts images into detailed features, indicating pathological and physiological conditions. 14,15mploying advanced algorithms, advances precision medicine, though challenges remain in representing SMTs with traditional features in our dataset.This study introduces wavelet and Laplacian of Gaussian (LoG) filters for better feature visibility, demonstrating their importance in classifying tumor types. 16While traditional diagnostic methods like EUS-guided fine-needle aspiration and unroofing biopsy provide direct tissue analysis, radiomics offers a non-invasive alternative, analyzing the entire lesion with potential for higher diagnostic accuracy and complementing these methods by identifying patterns not visible through conventional techniques.
We introduce a radiomics model underpinned by customized least absolute shrinkage and selection operator (MyLASSO) regression and the extremely randomized trees (Extra Trees) algorithm.Utilizing EUS-obtained ultrasound images as model inputs, we adopted the area under the receiver operating characteristic curve (AUC) as the evaluation criterion.Our objective was to assess the ultrasound radiomics model's feasibility in accurately predicting various tumor types and identifying key features to ensure model stability and precision.

Patients
The data was obtained from the Gastrointestinal Endoscopy Center of Shanghai Sixth People's Hospital, affiliated with Shanghai Jiaotong University.It included 2042 images from 144 patients, gathered between January 2019 and October 2020.It spans various age groups and genders, ensuring a broad representation.The tumor cases included 57 GSLs, 39 GISTs, 10 NETs, 10 GEPs, and 15 GMLs.Thirteen atypical cases were omitted from the analysis due to limited representation, inadequate sample size, or other factors (Figure 1).All patients provided informed consent, and the study adhered to the ethical guidelines of the 1975 Declaration of Helsinki (6th revision, 2008), as confirmed by prior institutional committee approval.All patients underwent multimodal ultrasonography, including WLI and EUS.The final diagnoses for all patients were based on histopathological results from biopsies, including both surgical and endoscopic biopsies.
The data encompasses patients' age, gender, and lesion location.The initial dataset, with a resolution of 764 × 572 pixels, contained extraneous details like dates, machine parameters, and corresponding WLI.These images were cropped to 388 × 457 pixels (Figure 2).In this set, all ultrasound images were processed with wavelet and LoG filters to minimize ultrasound noise before feature extraction.Tumor location annotations and region of interest (ROI) segmentation were manually executed by skilled ultrasound doctors.
Inclusion criteria were as follows: (1) age ≥18 years; (2) confirmed diagnosis of SMTs (including GSLs, GISTs, NETs, GEPs, and GMLs); (3) preoperative EUS examination; (4) exclusion of patients with indeterminate tumor types in the dataset.More details are shown in Figure 1.In total, 131 patients were included in our study.The patients were randomly divided into a training group (n = 105) and a testing group (n = 26) in an 8:2 ratio.The clinical characteristics of all patients are shown in Table 1.The baseline characteristics of patients in the training and test cohorts are shown in Table 2.There were no statistically significant differences between the two cohorts.

Pathologic assessment of response
All lesion annotations were performed by more than three experienced pathologists who conducted histopathological examination and analysis to determine the tumor type.The final pathology results were then reviewed by dedicated gastrointestinal pathologists for further assessment, ensuring accurate pathological findings.

Tumor masking
The patient's EUS images (including WLI) were analyzed by two radiologists (Dr.Li, a radiologist with 10 years of experience in tumor imaging, and Dr. Xia, a physician with 7 years of experience in tumor imaging).
Both performed manual creation of regions of interest on the images under masking of histopathological findings (Figure 3), including the entire tumor except the bowel lumen.

Radiomic feature extraction and statistical analysis
All EUS images were taken with OLYMPUS EU-ME3 at frequencies of 7.   tion displaying various frequency scales and different feature orientations within the tumor volume. 17These features have been widely used in previous radiomics studies. 18,19dditionally, we categorized SMTs into five types: GSL (Label = 0), GIST (Label = 1), NET (Label = 2), GEP (Label = 3), and GML (Label = 4), and performed necessary statistical analysis.After removing defective data, we used Pearson's chi-square test to compare tumor type differences based on attributes like age and gender.Statistical analyses were done with IBM SPSS Statistics 26.0, with significance levels at 0.05 or 0.01.

Radiomics model construction
The radiomics signature workflow involves lesion segmentation, feature extraction, selection, and model analysis (Figure 3).The model's efficacy was assessed using various machine learning models, with the Extra Trees model finalizing the tumor type classifications.This algorithm, a variation of Random Forest, selects suboptimal attributes.This reduces computational time, increases tree diversity, and offers strong generalization and noise resilience.
The best  value that maximized the AUC of the subjects in the dataset was selected as the best regularisation parameter.After the choice of  value, the radiomic score for each patient in the validation dataset was calculated using an extreme random forest model.Accuracy, AUC, sensitivity, and specificity values were calculated as metrics to assess the quantitative discrimination performance of radiomic features in this dataset (Table 3).

Feature selection method
To minimize overfitting in the final model, we first excluded features without significant tumor type differences.We then applied the customized LASSO algorithm, which sets coefficients of less significant features to zero.The model (MyLASSO) is defined as follows: For patients with GSL y i = 0; for patients with GIST y i = 1; for patients with NET y i = 2; for patients with GEP y i = 3; for patients with GML y i = 4; n is the total number of features used in the model; x ij (j = 1, 2, … , n) is the different features for each case, i is the number of cases;  j (j = 0, 1, 2, … , n) is the parameter of the model, and  is the error term.
The MyLASSO algorithm uses regularization to pare down parameter values, simplifying the model while keeping key features.Its objective function has a data fitting term (minimizing squared residuals) and a penalty term (penalizing model parameter absolute values).This penalty increases with the regularization parameter , TA B L E 3 Ten-fold cross-validation of Extra Trees (full features and selected features).making the model focus on fewer crucial features as  rises.

Accuracy
Its optimization problem can be expressed as minimizing the following cost function: where y i is the outcome for patient i, N is the number of patients, S is the sigmoid function, x ij is the j th feature of the i th patient and  is the regularization parameter.The sigmoid function S is defined as follows: The MyLASSO penalty | j | was applied in order to set some parameters  j to zero, which can generate a sparse model.Finally, the algorithm selected features with greater contributions.Although tumor sizes varied little, GSL and NET were typically smaller, while GIST, GEP, and GML were larger.Different tumors had distinct ultrasound echoes, emphasizing in-depth ultrasound image feature analysis.

Model selection
The dataset consists of 475 lesion images, with 428 for model training and 47 for evaluation.The model utilized all 1952 extracted features from each image and assessed performance using the AUC value.A higher AUC value is deemed superior to accuracy in Radiomics. 20After examination, the Extra Trees Classifier achieved the highest AUC value of 0.8998, making it the final classification model.
To further validate the stability of the Extra Trees model, 10-fold cross-validation was performed on the dataset.This approach helps reduce the impact of random variations and detect and mitigate the risk of overfitting.It helps to capture the performance of the model on the entire dataset and provides an objective assessment of its generalization ability. 21From Table 3, the average AUC is 0.9123 (0.8350-0.9650), indicating good stability of our model (Figure 4).

Feature selection and radiomics model construction
This study utilized a variety of statistical methods to analyze medical imaging data.Weighted feature rank- ing was conducted on all attributes, as feature extraction is pivotal in data processing.We adopted a multi-stage feature extraction method.
The independent samples t-test was used to assess relationships between features.Multivariate regression analysis evaluated the significance of each feature's regression coefficient.We organized the p-values in ascending sequence, and the top 33% of features (around 633 features) underwent further scrutiny.This phase aimed to discard non-influential features,minimizing data noise and redundancy.
To further optimize the feature subset, we applied MyLASSO regression to the remaining features for dimensionality reduction.By this method, we achieved dimensionality reduction and optimization of the data.We experimented with different values of  ranging from 10 −5 to 10 2 and recorded the coefficients of the MyLASSO models corresponding to each value (Figure 3).The optimal  value was selected as 2.54 × 10 −3 , and the relationship graph (Figure 3) between MSE and  demonstrated that this particular  value indeed produced the best results.These coefficients represent the importance of features in the model.We identified 30 key features in our study.
To assess the performance of the selected features, we calculated the root mean squared error (RMSE), resulting in a value of 1.374.
This indicates that the extracted features demonstrate good predictive performance in current processing tasks.

Analysis
Using the Extra Trees model, we re-classified and predicted the 30 re-extracted features, with results shown in Figure 4a-c.The results showcase strong classification performance: over 80% accuracy for GSL, GIST, GEP, and 100% for GML.However, the NET's smaller sample size limited its prediction accuracy to 75%.The precision-recall (PR) curve of the model is depicted in Figure 4c, revealing impressive outcomes.Figure 4d-f contrasts the performance with the full feature set, whereas Figure 4a-c highlights superior results.
We also evaluated our prediction model with 10-fold cross-validation (Table 3) to gauge its tumor classification proficiency.Compared with Table 3, the model shows enhanced performance versus the full dataset.The elevated AUC implies the feature selection process effectively weeded out redundant or detrimental features.The model's average AUC of 0.9260 underscores its precision in distinguishing lesion types-essential for medical image classification.The model's mean values for sensitivity (0.7771) and specificity (0.7708) emphasize its capacity to correctly identify positive samples, crucial for early disease detection and minimizing false negatives.
In summary, our radiomics classification model, based on 10-fold cross-validation results, boasts high AUC, sensitivity, and specificity-affirming its accuracy, consistency, and vital role in tumor diagnosis and research.

DISCUSSION
This study introduces an innovative radiomics model for identifying and classifying SMTs using EUS images.Analyzing EUS images from 144 patients, we extracted 1952 radiomic features.A rigorous process involving statistical testing and a custom LASSO regression identified 30 key features with significant discriminatory ability.These features underpinned a model with impressive diagnostic accuracy, indicated by an average AUC of 0.9260, and showed stability and calibration in validation cohorts.
Radiomics represents a significant step forward in diagnosing complex tumors, marking a first in SMT classification. 22It effectively captures comprehensive tumor data, offering insights into tumor characteristics previously unachievable with conventional methods. 23his research has vital clinical implications, particularly in preoperative diagnosis and treatment planning for SMTs, which, despite being mostly benign, can cause psychological stress and require surgical intervention, affecting recovery times. 24ur findings highlight radiomics' potential in SMT diagnosis automation, contrasting with recent studies like Binglan Zhang et al.'s work on using artificial intelligence for GIST diagnosis via EUS images, and another study's examination of artificial intelligence with contrast-enhanced harmonic EUS for differentiating GISTs from leiomyomas. 25,26Our approach encompasses a wider range of SMT classifications, demonstrating broader applicability and enhancing diagnostic methodologies in the field.
Compared to traditional diagnostic methods such as EUS fine-needle aspiration, unroofing biopsy, or mucosal incision-assisted biopsy, the methodology presented in this work, through detailed imaging feature extraction, achieves a non-invasive and more comprehensive analysis of lesions.This represents a significant advancement over conventional invasive techniques, which are susceptible to sampling errors.The adoption of radiomics in medical diagnostic processes signifies a shift towards less invasive, more informative diagnostic practices, potentially revolutionizing the clinical decision-making process for SMTs.
Nonetheless, our study is not devoid of limitations.The sample size, albeit sufficient for preliminary analysis, remains modest and warrants expansion in future research endeavors to enhance the model's generalizability and robustness.Additionally, the singular source of our data set-a lone medical center-could potentially introduce bias and limit the applicability of our findings across diverse clinical settings.Future studies are encouraged to embrace a multicentric approach, incorporating data from various geographical and demographic backgrounds to fortify the external validity of the model.
Radiomics is rapidly advancing, with improvements in imaging technologies and algorithms expected to enhance radiomic models' diagnostic accuracy and utility.Integrating radiomics with other diagnostic tools like MRI or CT could offer a more holistic approach to diagnosing SMTs.
In conclusion, our radiomics model signifies a significant step forward in the non-invasive diagnosis of SMTs, offering a novel tool that potentially aids clinicians in the early detection, classification, and management of these tumors.Despite the highlighted limitations, this study paves the way for future research, emphasizing the need for larger, multicentric studies and the exploration of advanced imaging modalities to enhance the diagnostic landscape for SMTs.

AC K N OW L E D G M E N T S
This experimental dataset was provided by the Gastrointestinal Endoscopy Center of Shanghai Sixth People's Hospital Affiliated with Shanghai Jiaotong University.We would like to express our gratitude to Director Xinjian Wan and Dr. Xiangyun Zhao for their great support of this project.

F I G U R E 1
Flowchart for the exclusion of patients: Patients who are minors and whose pictures are not clear are first excluded; Patients who have inconclusive pathology results; The tumor has fewer than 10 patients or the tumor is not a submucosal tumor; for example, hemangioma.The numbers in brackets correspond to the number of cases excluded for each condition.F I G U R E 2 The data set: (a) The pixel size is 764 × 572, including dates, machine parameters, and so on.(b) The pixel size is 388 × 457, without irrelevant information.
5 and 12 MHz.Clinical features TA B L E 1 Clinical and histologic characteristics of the study population.

F I G U R E 3
The workflow of the radiomics model construction.First, the dataset was labeled with ROIs after manual segmentation by the doctors; a total of 1952 features were extracted from the images that had been subjected to wavelet filters and LoG filters.Secondly, the Sankey diagram, statistical test, and MyLASSO were applied to the feature selection.Finally, the radiomics prediction model was constructed based on the selected features by Extra Trees.Abbreviations: MSE, mean square error; ROC, receiver operating characteristic; ROI, regions of interest; LoG, Laplacian of Gaussian; MyLASSO, the customized least absolute shrinkage and selection operator.

F I G U R E 4
Results of classification predictions by the customized Radiomics model: (a) Results of selected features; (b) ROC of the model (selected features); (c) PR curve of the model (selected features); (d) Results of full features; (e) ROC of the model (full features); (f) PR curve of the model (full features).Abbreviations: ROC, the receiver operating characteristic curve; PR curve, the precision-recall curve.

Table 1
tumor type distribution by gender.Continuous variables were analyzed using mean ± SD and t-tests; categorical ones used chi-square tests.Statistical analysis results for the training and test groups, including age, gender, tumor size, and their p-values, are shown in

Table 2 .
Results revealed no significant age or gender differences across tumor types, deeming them non-influential in tumor classification.Most tumors predominantly affected females, except for NET due to fewer cases.All intergroup p-values were above 0.05, suggesting no notable associations.